Python Learn and Predict
Script Window
Start by selecting 'Learn & Predict Script' from the drop down.
Environment: select the relevant Python environment where you ant to configure the model.
Packages: click the Packages button to view the packages that exist in the given environment.
Script: write the Python learn and predict script (see below for details).
Running Process Type
This selection determines the amount of data that is used to train the algorithm.
Fast: uses 20% of the data.
Accurate: uses 90% of the data.
Custom: enter a custom amount.
Input Window
Input: lists the columns that were input into the algorithm in the learn function.
Output Window
Output: the algorithm's output. Choose whether the output should be added to the existing table, or used to create a new table.
Save ML Model
Save Model: save the algorithm output as a machine learning model (see below to learn more).
Set as Target: set the Python node as the target in the data flow (see below to learn more).
Score Window
Score: the score that the algorithm assigns to the ML model, which indicates how reliable the model is. To produce this score, the algorithm compares its predictions based on the training data with the actual data.
Learn and Predict Algorithm
The learn and predict algorithm must contain the following three parameters:
Learn Function
def pyramid_learn(training_set):
Write a learn function, which will take the training data (input) and return the Machine Learning model (output).
To determine the size of the training data, make a selection from the Running Process Type below the Script window.
Eval Function
def pyramid_eval(model, testing_set):
Write a Pyramid eval function. The eval function evaluates the ML model produced by the learn function against a testing set (this is not the same testing set that was used by the learn function). It returns a model score indicating the reliability of the predictions, which is displayed in the 'Model Score' panel.
The eval function may or may not contain a predict function. It is generally used for prediction and computation.
Predict Function
def pyramid_predict(model, df):
Write a predict function which will apply the ML model to the entire data set. The output of the predict function is a Pandas DataFrame with prediction results. The output may be added as columns to an existing table, or used to create a new table.
Save ML Model
Save Model
Select this option to save the algorithm's output as a machine learning model. This stores the existing results and allows you to add the ML model to another data flow later on; this is useful if you want to apply the ML model to new data in the data flow. In this scenario, the algorithm will run faster because the previous results are stored. As the learn function was already run on the algorithm, only the predict function will run.
To save an ML Model, select Save Model; name the ML model in the textbox below. Save and execute the master flow.
To use the saved ML model in another data flow, go to the Scripting tab and add the Scripting Model node to the data flow; it must be connected to a data set with the same structure (columns and data types) as the data set on which it was initially run.
From the Scripting Model node's Properties panel, go to the Scripting Model window. Under Model Type, choose Python. Under Model Name, select the saved ML model.
Set As Target
This option is only enabled if you have selected Save Model. Choose this option to use the Python node as the target. In this scenario, the ETL data is not loaded into a database. As there's no ETL output, only the algorithm's learn and eval functions will run; the predict function will not run.
The ML model can then be connected to a different data flow, where the predict function will be run.
To use the Python node as the target, go to the Save ML Model window and select Save Model, then select Set as Target and name the model. Save and execute the master flow.
To use the saved ML model in another data flow, go to the Scripting tab and add the Scripting Model node to the data flow. The ML model can only be run if the data set has the same structure (columns and data types) as the data set on which the learn function was run.
From the Scripting Model node's Properties panel, go to the Scripting Model window. Under Model Type, choose Python. Under Model Name, select the saved ML model.